Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)
ÇѱÛÁ¦¸ñ(Korean Title) |
Æ®·£½ºÆ÷¸Ó¿Í BERT·Î ±¸ÇöÇÑ Çѱ¹¾î ÇüÅÂ¼Ò ºÐ¼®±âÀÇ ¼º´É ºÐ¼® |
¿µ¹®Á¦¸ñ(English Title) |
Performance Analysis of Korean Morphological Analyzer based on Transformer and BERT |
ÀúÀÚ(Author) |
ÃÖ¿ë¼®
ÀÌ°øÁÖ
Yongseok Choi
Kong Joo Lee
|
¿ø¹®¼ö·Ïó(Citation) |
VOL 47 NO. 08 PP. 0730 ~ 0741 (2020. 08) |
Çѱ۳»¿ë (Korean Abstract) |
º» ³í¹®Àº Transformer·Î ±¸ÇöÇÑ Çѱ¹¾î ÇüÅÂ¼Ò ºÐ¼®±â¸¦ ´Ù·é´Ù. Transformer´Â ÃÖ±Ù¿¡ °¡Àå ³Î¸® »ç¿ëµÇ´Â sequence-to-sequence ¸ðµ¨ Áß ÇϳªÀÌ´Ù. Transformer´Â ÀÎÄÚ´õ¿Í µðÄÚ´õ·Î ±¸¼ºµÇ¾î Àִµ¥ ÀÎÄÚ´õ´Â ¿ø¹®À» °íÁ¤µÈ Å©±âÀÇ º¤ÅÍ·Î ¾ÐÃà½ÃÅ°°í µðÄÚ´õ´Â ÀÌ º¤Å͸¦ ÀÌ¿ëÇÏ¿© ÇüÅÂ¼Ò ºÐ¼® °á°ú¸¦ »ý¼ºÇØ ³½´Ù. º» ³í¹®¿¡¼´Â ¶ÇÇÑ TransformerÀÇ ÀÎÄÚ´õ¸¦ BERT·Î ´ëüÇØ º»´Ù. BERT´Â ´ë¿ë·®ÀÇ ÇнÀµ¥ÀÌÅ͸¦ ÀÌ¿ëÇÏ¿© ¹Ì¸® ÇнÀ½ÃÄÑ ³õÀº ¾ð¾î Ç¥Çö ¸ðµ¨ÀÌ´Ù. µðÄÚ´õ¿¡´Â ÁÖÀÇ ¸ÞÄ¿´ÏÁò°ú º¹»ç ¸ÞÄ¿´ÏÁòÀ» µµÀÔÇÏ¿´´Ù. ÀÎÄÚ´õ¿Í µðÄÚ´õ¿¡¼ÀÇ Ã³¸® ´ÜÀ§´Â °¢°¢ ¾îÀý ´ÜÀ§ WordPiece¿Í ÇüÅÂ¼Ò ´ÜÀ§ÀÇ WordPiece¸¦ »ç¿ëÇÏ¿´´Ù. ½ÇÇèÀ» ÅëÇØ, BERTÀÇ ÆĶó¹ÌÅ͸¦ ¹®Á¦¿¡ ¸Â°Ô ÀçÁ¶Á¤ÇßÀ» ¶§ÀÇ ¼º´ÉÀÌ Transformer¸¦ ÀÓÀÇÀÇ °ªÀ¸·Î ÃʱâÈÇÏ¿© »ç¿ëÇßÀ» ¶§¿¡ ºñÇØ F1¿¡¼ 2.9%ÀÇ ¼º´É Çâ»óÀ» º¸ÀÓÀ» ¾Ë ¼ö ÀÖ¾ú´Ù. ¶ÇÇÑ ÇнÀ´Ü°è¿¡¼ ÃæºÐÈ÷ ÇнÀµÇÁö ¸øÇÑ WordPieceÀÇ ÀÓº£µùÀÌ ÇüÅÂ¼Ò ºÐ¼®¿¡ ¾î¶² ¿µÇâÀ» ¹ÌÄ¡´ÂÁöµµ »ìÆ캸¾Ò´Ù
|
¿µ¹®³»¿ë (English Abstract) |
This paper introduces a Korean morphological analyzer using the Transformer, which is one of the most popular sequence-to-sequence deep neural models. The Transformer comprises an encoder and a decoder. The encoder compresses a raw input sentence into a fixed-size vector, while the decoder generates a morphological analysis result for the vector. We also replace the encoder with BERT, a pre-trained language representation model. An attention mechanism and a copying mechanism are integrated in the decoder. The processing units of the encoder and the decoder are eojeol-based WordPiece and morpheme-based WordPiece, respectively. Experimental results showed that the Transformer with fine-tuned BERT outperforms the randomly initialized Transformer by 2.9% in the F1 score. We also investigated the effects of the WordPiece embedding on morphological analysis when they are not fully updated in the training phases.
|
Å°¿öµå(Keyword) |
½ÃÄö½º-Åõ-½ÃÄö½º
Çѱ¹¾î ÇüÅÂ¼Ò ºÐ¼®±â
Æ®·£½ºÆ÷¸Ó
BERT
ÁÖÀÇ ¸ÞÄ¿´ÏÁò
º¹»ç ¸ÞÄ¿´ÏÁò
sequence-to-sequence
Transformer
BERT
attention mechanism
Korean morphological analyzer
copying mechanism
|
ÆÄÀÏ÷ºÎ |
PDF ´Ù¿î·Îµå
|